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Abstract 

We give subquadratic algorithms that, given two necklaces each with n beads at arbitrary 
positions, compute the optimal rotation of the necklaces to best align the beads. Here alignment 
is measured according to the £ p norm of the vector of distances between pairs of beads from 
opposite necklaces in the best perfect matching. We show surprisingly different results for p = 1, 
p even, and p = oo. For p even, we reduce the problem to standard convolution, while for p = oo 
and p = 1, we reduce the problem to (min, +) convolution and (median, +) convolution. Then 
we solve the latter two convolution problems in subquadratic time, which are interesting results 
in their own right. These results shed some light on the classic sorting X + Y problem, because 
the convolutions can be viewed as computing order statistics on the antidiagonals of the X + Y 
matrix. All of our algorithms run in o(n 2 ) time, whereas the obvious algorithms for these 
problems run in Q(n 2 ) time. 
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1 Introduction 



How should we rotate two necklaces, each with n beads at different locations, to best align the 
beads? More precisely, each necklace is represented by a set of n points on the unit-circumference 
circle, and the goal is to find rotations of the necklaces, and a perfect matching between the beads 
of the two necklaces, that minimizes some norm of the circular distances between matched beads. 
In particular, the £\ norm minimizes the average absolute circular distance between matched beads, 
the £2 norm minimizes the average squared circular distance between matched beads, and the loo 
norm minimizes the maximum circular distance between matched beads. The £\ version of this 
necklace alignment problem was introduced by Toussaint [39] in the context of comparing rhythms 
in computational music theory, with possible applications to rhythm phylogeny j22j,l40] . 

Toussaint [39] gave a simple 0(n 2 )-time algorithm for £\ necklace alignment, and highlighted as 
an interesting open question whether the problem could be solved in o(n 2 ) time. In this paper, we 
solve this open problem by giving o(n 2 )-time algorithms for £\, £2, and £oo necklace alignment, in 
both the standard real RAM model of computation and the less realistic nonuniform linear decision 
tree model of computation. Our results for the case of the £\ and £oq distance measures in the real 
RAM model also answer the questions posed by Clifford et al. in [13] (see the shift matching 
problem in Problem 5 of the tech report). 

Necklace alignment problem. More formally, in the necklace alignment problem, the input is a 
number p representing the £ p norm, and two sorted vectors of n real numbers, x — (^0? ^l: ■ • • , &n— 1) 
and y = (yo,yi, ■ ■ ■ , y n -i), representing the two necklaces. See Figure [TJ Canonically, we assume 
that each number Xi and yi is in the range [0, 1), representing a point on the unit-circumference 
circle (parameterized clockwise from some fixed point). The distance between two beads X{ and yj 
is the minimum between the clockwise and counterclockwise distances along the circumference of 
the unit-perimeter circular necklaces. We define this distance as: 

d°{xi,yj) = min{|xi - yj\ , (1 - \xi - yj\)}. 

The optimization problem involves two parameters. The first parameter, the offset c € [0,1), 
is the clockwise rotation angle of the first necklace relative to the second necklace. The second 
parameter, the shift s E {0,1, ... ,n}, defines the perfect matching between beads: bead i of the 
first necklace matches with bead (i + s) mod n of the second necklace. (Here we use the property 
that an optimal perfect matching between the beads does not cross itself.) 




Figure 1: An example of necklace alignment: the input (left) and one possible output (right). 
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The goal of the £ p necklace alignment problem is to find the offset c G [0, 1) and the shift 
s G {0, 1, . . . , n} that minimize 

n-l 

(d°((xi + c) mod 1, y {i+s) mod n )) p (1) 

or, in the case p = oo, that minimize 

n— l 

max{d°((xi + c) mod 1, yi i+s) mod „)}. 

The £i, ^2> and ^oo necklace alignment problems all have trivial 0(n 2 ) solutions, although this 
might not be obvious from the definition. In each case, as we show, the optimal offset c can 
be computed in linear time for a given shift value s (sometimes even independent of s). The 
optimization problem is thus effectively over just s G {0, 1, . . . , n}, and the objective costs O(n) 
time to compute for each s, giving an 0(n 2 )-time algorithm. 

Although necklaces are studied throughout mathematics, mainly in combinatorial settings, we 
are not aware of any work on the necklace alignment problem before Toussaint [39] . He introduced 
£\ necklace alignment, calling it the cyclic swap-distance or necklace swap- distance problem, with a 
restriction that the beads lie at integer coordinates. Ardila et al. [2] give a 0(/s 2 )-time algorithm for 
computing the necklace swap-distance between two binary strings, with k being the number of 1- 
bits (beads at integer coordinates). Colannino et al. |16| consider some different distance measures 
between two sets of points on the real line in which the matching does not have to match every 
point. They do not, however, consider alignment under such distance measures. 

Aloupis et al. pQ, consider the problem of computing the similarity of two melodies represented 
as closed orthogonal chains on a cylinder. Their goal is to find the proper (rigid) translation of 
one of the chains in the vertical (representing pitch) and tangential (representing time) direction so 
that the area between the chains is minimized. The authors present an 0{mn lg(n + m)) algorithm 
that solves the problem. When the melodic chains each have a note at every time unit, the 
melodic similarity problem is equivalent to the necklace alignment problem, and as our results are 
subquadratic, we improve on the results of Aloupis et al. pQ for this special case. 

Convolution. Our approach in solving the necklace alignment problem is based on reducing it 
to another important problem, convolution, for which we also obtain improved algorithms. The 
(+,•) convolution of two vectors x = (xo, x±, . . . , and y = (yo, y\, . . . , y n -i), is the vector 

x*y = (z ) where Zk = Z)j=o s « ' Vk-i- One can generalize convolution to any (©,©) 

operators. Algorithmically, a convolution with specified addition and multiplication operators 

(here denoted x*y) can be easily computed in 0(n 2 ) time. However, the (+, •) convolution can 

be computed in O(nlgn) time using the Fast Fourier Transform [18,29,30], because the Fourier 
transform converts convolution into elementwise multiplication. Indeed, fast (+, •) convolution 
was one of the early breakthroughs in algorithms, with applications to polynomial and integer 
multiplication [5], batch polynomial evaluation [191 Problem 30-5], 3SUM (3j[23], string matching 
[EJ[m[25j[3lJ[32] > ma trix multiplication [15], and even juggling [8]. 

In this paper we use three types of convolutions: (min, +) convolution, whose fcth entry z^ = 
min^Q {xi + yk-i}', (median, +) convolution, whose fcth entry Zk = medianf =0 {xi + yk-i}', and 
(+, .) convolution, whose kth entry Zk = Y^l=o{ x i-yk-i\- As we show in Theorems O [H and [P71 
respectively, £2 necklace alignment reduces to standard (+, •) convolution, £00 necklace alignment 
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reduces to (min, +) [and (max, +)] convolution, and £\ necklace alignment reduces to (median, +) 
convolution. The (min, +) convolution problem has appeared frequently in the literature, already 
appearing in Bellman's early work on dynamic programming in the early 1960s j3J|24,33-35,38j. Its 
name varies among "minimum convolution", "min-sum convolution", "inf-convolution" , "infimal 
convolution", and "epigraphical sum"0 To date, however, no worst-case o(n 2 )-time algorithms for 
this convolution, or the more complex (median, +) convolution, has been obtained (it should be 
noted here that the quadratic worst-case running time for (median, +) convolution follows from 
linear-time median finding [6l|36]). In this paper, we develop worst-case o(n 2 )-time algorithms for 
(min, +) and (median, +) convolution, in the real RAM and the nonuniform linear decision tree 
models of computation. 

The only subquadratic results for (min, +) convolution concern two special cases. First, the 
(min, +) convolution of two convex sequences or functions can be trivially computed in 0(n) time 
by a simple merge, which is the same as computing the Minkowski sum of two convex polygons [35] . 
This special case is already used in image processing and computer vision [241133) . Second, Bussieck 
et al. [7] proved that the (min, +) convolution of two randomly permuted sequences can be computed 
in 0(n\gn) expected time. Our results are the first to improve the worst-case running time for 
(min, +) convolution. 

Connections to X + Y. The necklace alignment problems, and their corresponding convolution 
problems, are also intrinsically connected to problems on X + Y matrices. Given two lists of n 
numbers, X = (xq, x±, . . . , x n -i) and Y = (yo, y\, . . . , y n -i), X + Y is the matrix of all pairwise 
sums, whose (i,j)th entry is xi + yj. A classic unsolved problem [20] is whether the entries of 
X + Y can be sorted in o(n 2 lgn) time. Fredman |27j showed that (9(n 2 ) comparisons suffice in 
the nonuniform linear decision tree model, but it remains open whether this can be converted into 
an 0(n 2 )-time algorithm in the real RAM model. Steiger and Streinu [37] gave a simple algorithm 
that takes 0(n 2 lgn) time while using only 0(n 2 ) comparisons. 

The (min, +) convolution is equivalent to finding the minimum element in each antidiagonal 
of the X + Y matrix, and similarly the (max, +) convolution finds the maximum element in each 
antidiagonal. We show that necklace alignment is equivalent to finding the antidiagonal of X+Y 
with the smallest range (the maximum element minus the minimum element). The (median, +) 
convolution is equivalent to finding the median element in each antidiagonal of the X+Y matrix. We 
show that l\ necklace alignment is equivalent to finding the antidiagonal of X + Y with the smallest 
median cost (the total distance between each element and the median of the elements). Given the 
apparent difficulty in sorting X + Y, it seems natural to believe that the minimum, maximum, 
and median elements of every antidiagonal cannot be found, and that the corresponding objectives 
cannot be minimized, any faster than 0(n 2 ) total time. Figure [2] shows a sample X + Y matrix 
with the maximum element in each antidiagonal marked, with no apparent structure. Nonetheless, 
we show that o(n 2 ) algorithms are possible. 

Our results. In the standard real RAM model, we give subquadratic algorithms for the £\, £2, 
and £oq necklace alignment problems, and for the (min, +) and (median, +) convolution problems. 
We present: 

1. an 0(nlgn)-time algorithm on the real RAM for £2 necklace alignment (Section [3]). 

1 "Tropical convolution" would also make sense, by direct analogy with tropical geometry, but we have never seen 
this terminology used in print. 
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Figure 2: An X + Y matrix. Each polygonal line denotes an antidiagonal of the matrix, with a 
point at coordinates (x, y) denoting the value x + y for x £ X and y £ Y. An x denotes the 
maximum element in each antidiagonal. 

2. an 0(n 2 / lg ra)-time algorithm on the real RAM for £oo necklace alignment and (min, +) 
convolution (Section [3]). This algorithm uses a technique of Chan originally developed for the 
all-pairs shortest paths problem [9]. Despite the roughly logarithmic factor improvements for 
l\ and -^oo, this result does not use word-level bit tricks of word- RAM fame. 

3. a further improved 0(n 2 (lg lg n) 3 / lg 2 n)-time algorithm for 1^ necklace alignment and (min, +) 
convolution (Section 0]). We actually give a direct black-box reduction of (min, +) convolu- 
tion to all-pairs shortest paths; the result then follows from the current best upper bound for 
all-pairs shortest paths [10]. The all-pairs shortest paths works in the real RAM with respect 
to the inputs, i.e. it does not use bit tricks on the inputs. The algorithm, however, requires 
bit tricks on other numbers, but works in a standard model that assumes (lgn)-bit words. 

4. an 0{n 2 (lg lg n) 2 / lg n)-time algorithm on the real RAM for l\ necklace alignment and (median, +) 
convolution (Section [5]). This algorithm uses an extension of the technique of Chan [9]. 

In the nonuniform linear decision tree model, we give particularly fast algorithms for the l\ and 
£oo necklace alignment problems, using techniques of Fredman |27U28j : 

5. 0(ny / n)-time algorithm in the nonuniform linear decision tree model for necklace align- 
ment and (min, +) convolution (Section [4|). 
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6. 0{nyj n lg n)-time algorithm in the nonuniform linear decision tree model for l\ necklace 
alignment and (median, +) convolution (Section [5]). 

(Although we state our results here in terms of (min, +) and (median, +) convolution, the results 
below use — instead of + because of the synergy with necklace alignment.) We also mention 
connections to the venerable X + Y and 3SUM problems in Section [6l 

2 Linear Versus Circular Alignment 

Before we proceed with proving our results, we first show that any optimal solution to the necklace 
alignment problem can be transformed into an optimal solution to the problem of linear alignment — 
aligning and matching beads that are on a line. We then can use the simpler optimization function 
of the "linear alignment problem" to show our results. Let d~(xi,yj) = \x{ — yj\ be the linear 
distance between two beads Xi and yj. In the linear alignment problem we are given two sorted 
vectors of real numbers x = (xq,xi, . . . , and y = (yo, y±,..., y m -\) with m > n, and we want 

to find s (s < m — n) and c that minimize 

n-l 

Y J {d-(x i + c,y i+s )) p (2) 
i=0 

or, in the case p = oo, that minimize 

max{d~(xi + c, yi +s )}. 

i=0 

The main difference between (pQ) and ([2]) is that instead of taking the minimum between the 
clockwise and counterclockwise distances between pairs of matched beads in ([I]), we are simply 
summing the forward distances between beads in ([2]). We will now show that whether the beads 
are on a line (repeating y infinitely many times on the line) or a circle, the optimal alignment of 
the beads x and y in these two cases are equal. 

Let M° and M~ be an alignment/matching of the beads of x and y along the unit circumference 
circle C and the infinite line segment L respectively. An edge (xi,yj) of M° (M~) is the shortest 
segment that connects two matched beads Xi and yj in M° (M - ); thus, the length of (xi,yj) is 
equal to d°(xi,yj) (d~ (xi,yj)). We will show that the sum of the lengths of the edges of each of 
the optimal matchings M°* and M~* are equal. Note that by the quadrangle inequality, we have 
that the edges of both M°* and M~* are non-crossing. 

Observation 1. Consider any edge (xi,yj) along the circular necklace. If this edge crosses point 
0, then the distance 

d°{xi,yj) = (1 - \xi - yj\); 

otherwise, 

d°(xi,yj) = \xi - yj\. 
Let yy be the doubling of the vector y such that 

yy = (yo, ■ ■ ■ ,y m -i,yo, ■ ■ -,ym-i) = (wo, ■ ■ ■ ,yy m -i,yym, ■ ■ ■ , 2/2/2™- 1)- 

Theorem 2. If M°* is the optimal matching of two given vectors x and y (both of length n) along 
the unit- circumference circle C and M~* is the optimal matching of x and yy along line L, then 
|M°*| = \M-*\. 
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Proof. First we show that the value of any optimal matching of a set of beads along L is at least 
as large as the value of the optimal solution of the beads along C. Given an optimal matching 
M~* = {s,c} beads along L, we will "wrap" the line L around a unit circle C by mapping each of 
the n matched beads Xi (yyi +s ) along L to x° (y° i+s ^ mod n ) along C. (Thus we have exactly n pairs 
of beads along C '.) Now, for every i = 0, 1, . . . , n — 1, the length of an edge of M~* is equal to 

dT(xi + c, yyi+s) 

= d (xi + C, £/(j+ s ) mod n) 
\%i ~\~ C — £/(j-|- a ) mod n\ 

> mxa{\(xi + c) mod 1 - y {i+s) mod n \ , (1 - \ (xi + c) mod 1 - y (i+a) mod n \)} 
= d°((z? + c) mod 1, y£ +a) mod J. 

Thus, as every edge length of the matching Af~* is at least as large as its corresponding edge along 
the circle C, we have \M~*\ > \M°*\. 

Next we show that the value of any optimal matching of a set of beads along C is at least 
as large as the value of the optimal solution of the beads along L. Suppose we have an optimal 
matching M°* = (s,c). We map every point x% + c and yi to the infinite line segment so that the 
edges of M°* are preserved in M~ . Thus, for all i = 0, 1, . . . , n — 1 and k 6 Z, we map 

(xi + c) mod 1 i — y x~ = Xi + c, 

Vi yr+fcn = ^ + A: - 

With this transformation, in any valid matching M~ , the matched beads span at most two 
consecutive intervals [k, k + 1) and [fe + 1, k + 2) for any k G Z. In particular, the beads Xj + c span 
the intervals [0, 1) and [1,2). 

Now we construct M~ given M°* by matching every x^ to y~ +s+kn such that, whenever x^+c < 1 
(see Figure [3]) , 

• k = -1 

if ((xj + c) mod 1, y (i+a) mod n ) crosses point and (xj + c) mod 1 < j/ (i+a) mo d n! 

• A; = 1 

if ((xj + c) mod 1, y (i+s) mod n ) crosses point and (xj + c) mod 1 > j/ (i+a) mo d n! 

• k = 

if ((xj + c) mod 1, j/(i+ a ) m od n) does not cross point 0; 
and, whenever x% + c > 1, we increment A; by 1 in each of the cases. Thus, when x% + c > 1, 

• fc = 

if ((Xj + c) mod 1, y {i+s) mod n ) crosses point and (x { + c) mod 1 < y {i+s) mod n ; 

• jfe = 2 

if ((xj + c) mod 1, y (i+a) mo d n) crosses point and (xj + c) mod 1 > y (i+s) mo d n\ 

• k = 1 

if ((xj + c) mod 1, y(j+ a ) mod n) does not cross point 0. 
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(a) A matching M° of beads along a (b) Unrolling a necklace to a line L by preserving the edges of 
circular necklace with three types of M° in the matching M~ . In this example, < xq + c < 1. 
edges: those that do not cross point 
0, and those that cross point with 
(xi + c) mod 1 either greater or less 
than yt+s- 

Figure 3: Unrolling a circular necklace to a line. 



Here, the variable k basically decides the interval [—1,0), [0,1), or [1,2) in which the bead 
U(i+s) modn is located, based on the type of the edge ((xj + c) mod l,y( i+s ) mo dn)- Observe that, 
if an edge in M°* crosses point 0, then its corresponding edge in M~ crosses (r, r) for some 
rG {0,1,2}. 

Now, the sum of the distances of the matched beads of M~ is equal to 

CT Oo + C, V(0+s) mod n + k)+d~ (x 1 + C, 1/(i +s ) mod n + k) H h 

d~ {x n _ s _i + c, + k) + dT (x n „ s + c, y + k + 1) H h 

+ c, + fc + 1). 

We claim that the value of M°* is equal to at least the value of this matching M~. We show 
this claim by comparing the length of each edge ((xj + c) mod 1, yi+ s ) of M°* with its corresponding 
edge in M". 

Edges that do not cross point 0: If an edge of M°* does not cross point 0, then the corre- 
sponding edge in M~ does not cross any of the edges (0,0), (1, 1) or (2,2); hence both endpoints 
(beads) of the given matching edge are within the same interval [r, r + 1) for some r € {0, 1} (the 
purple edge (x q + c, y q + s ) in Figured]). This means that, when Xi + c = (xi + c) mod 1, we have 
k = and 

d~ (xi + c, y {i+s) modn + k) =| (xi + c) - (y( i+s ) mo d n + k) | 

= I (a* + c) mod 1 - (y (i+s) mod J | 

^(i+s)modnl 

= d° (x~ , y ( - +s) mod n ) (by Observation U) . 

We can similarly show that, when Xi + c = (xi + c) mod 1 + 1, we have fc = 1 and the edges of 
M°* that do not cross point have the same length as their corresponding edge in M~ . 

Edges that cross point 0: If an edge of M°* crosses point 0, then the corresponding edge in 
M~ crosses edge (r, r) and hence the two endpoints (beads) of the given edge of M~ must be in 
different and consecutive intervals: [r — 1, r) and [r, r + 1) for some r £ {0, 1, 2} (the green and blue 
edges (xi + c,yi + s),(xj + c, yj + s) in Figure [3]). Then, assuming xi + c = (xj + c) mod 1, we have 



8 



When (xi + c) mod 1 < y {i+s) modn , k 

d (X{ + C, y(i+ s ) mod n + &) 



— 1 and 

+ C) - (y (i+s ) m odn - 1)1 

| (xi + c) mod 1 - y (i+s) mod n + 1| 
(1 - + c) mod 1) - y (i+s) mo dnl) 

(1 ^((i+s) mod n)—n I ) 

<P modn) _J(by Observation!]). 



When (xi + c) mod 1 > y^ i+s) mod „ 

(i (Xi + C, mod n + ^) 



A; = 1 and 



mod n + 1)1 

= \(xi + c) mod 1 - y(i +s) mod n - 1| 

= (1 - \ {(Xi + C) mod 1) - ?/(i+ s ) modnl) 
= (1 ~~ \ x i ~ V((i+s) mod n)+n D 

= + c , y- +s) mod n)+n )(by Observation d|). 

We can similarly show that, when Xi + c = (rcj + c) mod 1 + 1, the edges of M°* that cross 
point have the same length as their corresponding edge in . 

Therefore, the length of every edge of M°* along the circle is equal to the length of its corre- 
sponding edge in M~. Thus, the value of the matching M°* is at least as large as that of M~*, 
completing the proof of the theorem. □ 



We now proceed to prove our results by using the objective function 



3 £2 Necklace Alignment and (+, •) Convolution 



In this section, we first show how £2 necklace alignment reduces to standard convolution, leading 
to an 0{n lgn)-time algorithm that uses the Fast Fourier Transform. We then show how this result 
generalizes to l v for any even p. It should be noted here that the £2 necklace alignment problem 
was solved independently by Clifford et al. [13] (see Problem 5) using Fast Fourier Transforms. 
Our proof uses essentially the same technique of expanding the squared term and then optimizing 
terms separately, but goes through the steps in more detail; we include our proof for completeness. 
More results that use the FFT to solve different flavors of matching problems may also be found 
in [H] and [II]. 

Theorem 3. The £2 necklace alignment problem can be solved in 0{n\gn) time on a real RAM. 
Proof. The objective ([2]) expands algebraically to 

n-1 

{ x i ~ U{i+s) mod n + c ) 



i=0 

n-1 



n-1 



i=0 

n-1 



i+s) mod n 



+ 2cxi - 2cy( i+s) mo d n + c 2 ) - 2 E Xiy ( 



(i+s) mod n 



i=0 



n-1 



E + + 20Xi ~ 2C ^ + ° 2 ) ~ 2 E ^(i+s) mod n 



8=0 

n-1 



n-1 



E ( X i + + 2C E ( Xi ~ yi ) + UC 



i=0 
2 



i=0 



i=0 



n-1 



2 ^2 Xiy { . 



i+s) mod n- 



i=0 
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The first term depends solely on the inputs and the variable c, while the second term depends 
solely on the inputs and the variable s. Thus the two terms can be optimized separately. The first 
term can be optimized in 0(n) time by solving for when the derivative, which is linear in c, is zero. 
The second term can be computed, for each s S {0, 1, ... ,n — 1}, in O(nlgn) time using (+, •) 
convolution (and therefore optimized in the same time). Specifically, define the vectors 

x' = {x ,xi, . . . ,x n _i; 0,0, .. . , ), 

n 

if = (yn-uVn-2, ■ ■ ■ ,yo;yn-i,y n -2, ■ ■ ■ ,yo)- 

Then, for s' € {0, 1, . . . , n — 1}, the (n + s')th entry of the convolution x'*if is 

n+s 1 n—1 

^ ] x iyn+s'—i = ^ ] x iy(i—s'—l) mod ni 
i=0 i=0 

which is the desired entry if we let s' = n — 1 — s. We can compute the entire convolution in 
0(n lgn) time using the Fast Fourier Transform. □ 

The above result can be generalized to £„ for any fixed even integer p. When p > 4, expanding 
the objective and rearranging the terms results in 

n— 1 n—1 p 



^^( X i ~ y(i+s) mod n + C ) P = ( - )( x i ~ mod nY 

i=0 i=0 j=0 



j=0 \ v/ i=0 



which is a degree-p polynomial in c, all of whose coefficients can be computed for all values of s by 
computing 0(p 2 ) convolutions. 

Theorem 4. The £ p necklace alignment problem with p even can be solved in 0(p 2 nlgn) time on 
a real RAM. 

4 Necklace Alignment and (min, +) Convolution 
4.1 Reducing 4» Necklace Alignment to (min, +) Convolution 

First we show the relation between £ 00 necklace alignment and (min, +) convolution. We need the 
following basic fact: 

Fact 5. For any vector z = (zq, z\, ■ ■ ■ , z n -\) and c = —\ (min™^ 1 Zi + max™^ 1 zA, the minimum 
value o/max^Zo \zi + c\ is 

1 / n.-l n-l \ 

— max Zi — mm Zi . 
2\i=o i=o J 

Instead of using (min, +) convolution directly, we use two equivalent forms, (min,—) and 
(max, — ) convolution: 
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Theorem 6. The £oo necklace alignment problem can be reduced in 0(n) time to one (min, — ) 
convolution and one (max, — ) convolution. 

Proof. For two necklaces x and y, we apply the (min, — ) convolution to the following vectors: 

x = (x ,xx, . . . ,x n _i; 00,00, . . . ,00), 

n 

y = (y n -i,y n - 2 , . . . ,yo;y n -i,y n -2, ■ ■ ■ ,yo)- 
Then, for s' G {0, 1, . . . , n — 1}, the (ra + s')th entry of x' * if is 

min 

n+s' i i . n-1 . 

min(ij - Vn+s'-i) = mm(xi - y^-i) mo dn), 

1=0 1=0 

which is min™ = T 1 (rEj— mod n ) if we let s' = n—l — s. By symmetry, we can compute the (max, — ) 
convolution x" * if, where x" has — oo's in place of oo's, and use it to compute max^T^Xj — 

max 

V(i+s) mod n) f° r each s G {0, 1, . . . , n — 1}. Applying Fact we can therefore minimize max™^ 1 \xi — 
U(i+s) modn + c l over c i f° r each s G {0, 1, . . . , n — 1}. By brute force, we can minimize over s as 
well using 0(n) additional comparisons and time. □ 

4.2 (min, — ) Convolution in Nonuniform Linear Decision Tree 

For our nonuniform linear decision tree results, we use the main theorem of Fredman's work on 
sorting X + Y: 

Theorem 7. [27] For any fixed set Y of permutations of N elements, there is a comparison tree 
of depth 0(N + lg |r|) that sorts any sequence whose rank permutation belongs to T. 

Theorem 8. The (min, — ) convolution of two vectors of length n can be computed in 0{ny/n) time 
in the nonuniform linear decision tree model. 

Proof. Let x and y denote the two vectors of length n, and let x * y denote their (min, — ) convo- 

min 

lution, whose kth entry is min^ =0 (xi — yk-i)- 

First we sort the set D = {x{ — Xj,yi — yj : \i — j\ < d} of pairwise differences between nearby 
Xj's and nearby y^s, where d < n is a value to be determined later. This set D has iV = O{nol) 
elements. The possible sorted orders of D correspond to cells in the arrangement of hyperplanes 
in M. 2n induced by all (^) possible comparisons between elements in the set, and this hyperplane 
arrangement has 0(N 4n ) cells. By Theorem [TJ there is a comparison tree sorting D of depth 
0(N + nlgN) = 0{nd + n\gn). 

The comparisons we make to sort D enable us to compare Xi — yk-i versus Xj — yt-j for free, 
provided \i — j\ < d, because xi — yu-i < xj — yu-j precisely if Xi — Xj < yk-i — yk-j- Thus, in 
particular, we can compute 

•^fe(^) = mm — Uk-i i = A, A + 1, . . . , min{A + d, n} — l| 

for free (using the outcomes of the comparisons we have already made). 

We can rewrite the kth entry min^ =0 (xi— yk-i) of x * y as minjM^O), Mk(d), Mk(2d), . . . , Mk( \k/d]d)}, 

mm 

and thus we can compute it in 0(k/d) = 0(n/d) comparisons between differences. Therefore all n 
entries can be computed in 0(nd + nigra + n 2 /d) total time. 
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This asymptotic running time is minimized when nd = 0(n 2 /d), i.e., when d 2 = 0(n). Sub- 
stituting d = y/n, we obtain a running time of 0{n^Jn) in the nonuniform linear decision tree 
model. □ 

Combining Theorems [6] and [8l we obtain the following result: 

Corollary 9. The necklace alignment problem can be solved in 0(riy/n) time in the nonuniform 
linear decision tree model. 



4.3 (min, — ) Convolution in Real RAM via Geometric Dominance 

Our algorithm on the real RAM uses the following geometric lemma from Chan's work on all-pairs 
shortest paths: 

Lemma 10. [9J Lemma 2.1] Given n points pi,P2, ■ ■ ■ ,Pn ^ n d dimensions, each colored either red 
or blue, we can find the P pairs (pi,pj) for which p% is red, pj is blue, and pt dominates pj (i.e., 
for all k, the kth coordinate of pi is at least the kth coordinate ofpj), in 2°^n 1+£ + 0(P) time for 
arbitrarily small e > 0. 

Theorem 11. The (min, — ) convolution of two vectors of length n can be computed in 0(n 2 / lg n) 
time on a real RAM. 

Proof. Let x and y denote the two vectors of length n, and let x * y denote their (max, — ) convolu- 

max 

tion. (Symmetrically, we can compute the (min, — ) convolution.) for each i S {0, d, 2d, . . . , [n/d\d}, 
and for each j € {0, 1, . . . , n — 1}, we define the d-dimensional points 

PS,i — {%i+5 ^i+<5 -Ci+lj ■ • • > ^i+d— l)) 

qsj = (Vj-s-Vi, Vj-S-Vi-i, Uj-s - Vj-d-i)- 

(To handle boundary cases, define Xi = oo and yj = — oo for indices i,j outside [0, n — 1].) For each 
5 € {0, 1, . . . , d — 1}, we apply Lemma [10] to the set of red points {ps t i : i = 0, d, 2d, . . . , [n/d\d} 
and the set of blue points {q$j : j = 0, 1, . . . ,n — 1}, to obtain all dominating pairs (ps,i, qs,j)- 

Point ps i dominates q$j precisely if Xi + s — Xi + s' > Uj-S ~ Vj-S' f° r all 5' G {0, 1, . . . , d — 1} 
(ignoring the indices outside [0,n — 1]). By re-arranging terms, this condition is equivalent to 
x i+ s — Uj-5 > %i+8' — Uj-S' f° r an 6' & {0,1, . . . ,d — 1}. If we substitute j = k — i, we obtain that 
ips,i, qs,k-i) is a dominating pair precisely if x i+ s - yk-i-s = max^ (x i+ s> - y k -i-S')- Thus, the set 
of dominating pairs gives us the maximum M k (i) = max{xi-y k -i,Xi + i-y k -i + i, ■ ■ ■ , x min { i+dyn }_i - 
2/min{fc-j+d,n.}-i} f° r each i divisible by d and for each k. Also, there can be at most 0(n 2 /d) 
such pairs for all i,j,5, because there are 0(n/d) choices for i and 0(n) choices for j, and if 
(j>S,i,qs,j) is a dominating pair, then (ps' ,i, q$' ,j) cannot be a dominating pair for any 6' / 5. (Here 
we assume that the max is achieved uniquely, which can be arranged by standard perturbation 
techniques or by breaking ties consistently [9].) Hence, the running time of the d executions 
of Lemma 1101 is d2°^n l+£ + 0{n 2 /d) time, which is 0(n 2 / Ign) if we choose d = algn for a 

sufficiently small constant a > 0. We can rewrite the fcth entry maxf =n (xj — yu-i) of x * y as 

max 

max{M k (0), M k (d), M k (2d), . . . , M k (\k/d]d)}, and thus we can compute it in 0(k/d) = 0(n/d) 
time. Therefore all n entries can be computed in 0(n 2 /d) = 0(n 2 j 'lgn) time on a real RAM. □ 

Combining Theorems l6l and \W\ we obtain the following result: 

Corollary 12. The necklace alignment problem can be solved in 0(n 2 / lgn) time on a real 
RAM. 
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Although we will present a slightly faster algorithm for (min, — ) convolution in the next subsec- 
tion, the approach described above will be useful later when we discuss the (median, — ) convolution 
problem. 



4.4 



mm 



Convolution via Matrix Multiplication 

Our next algorithm uses Chan's 0(n 3 (lglgn) 3 /lg 2 n) algorithm for computing the (min, +) matrix 
multiplication of two n x n matrices [10] (to which all-pairs shortest paths also reduces). We 
establish a reduction from convolution to matrix multiplication. 

Theorem 13. If we can compute the (min, — ) matrix multiplication of two nx n matrices in T{n) 
time, then we can compute the (min, — ) convolution of two vectors of length n in 0((n+T(y/n))y/n) 
time. 

Proof. We claim that computing the (min, — ) convolution z = x * y reduces to the following 

mm 

(min, — ) matrix multiplication: 



P = < 



, \ x n-y/n x n—y/n+l 



X n -1 ) 



Vy/n-1 



yi 
V yo 



2/2 

yi 



y n ~2 

yn-3 



Vn-1 \ 

y n -2 



The (z, j)th entry pij of this product P is 



Y^n— 2 yn—^/n—1 



n—y/n+1 



fn-1 



Pi,j — ml n I x iyfH,+m ~ yj+Jri-1- 
m=U \ v 

Let k = \kj \fn\\fn denote the next smaller multiple of y/n from k. Now, given the product P 
above, we can compute each element z^ of the convolution z as follows: 



z k 



mm 



P0,k+l-i/n>Pl,k+l-2,/niP2,k+l-3y/ri> " " " ' P[k/^n\ -l,k- [k/^/nj ^fni 



■r, 



Vk-k^ x k+l 2/fc-fc-l' 



,x k -yo 



This min has 0(y/n) terms, and thus z^ can be computed in 0(y/n) time. The entire vector i*can 
therefore be computed in 0{ny/n) time, given the matrix product P. 

It remains to show how to compute the rectangular product P efficiently, given an efficient 
square-matrix (min, — ) multiplication algorithm. We simply break the product P into at most \fn 
products of yfn x sjn matrices: the left term is the entire left matrix, and the right term is a block 
submatrix. The number of blocks is \{n — \fri + l)/^/n\ < yjn. Thus the running time for the 
product is 0(T(y/n)y/n). 

Summing the reduction cost and the product cost, we obtain a total cost of 0((n + T(y/n))y/n). 

□ 

Plugging in T(n) = 0(n 3 /lgn) from [9j allows us to obtain an alternative proof of Theorem II li 
Plugging in T(n) = 0(n 3 (lglgn) 3 /lg 2 n) from [10] immediately gives us the following improved 
result: 

Corollary 14. The (min, — ) convolution of two vectors of length n can be computed in 
0(n 2 (lglgn) 3 /lg 2 n) time on a real RAM. 



13 



Combining Theorem [6] and Corollary 1141 we obtain the following result: 

Corollary 15. The necklace alignment problem can be solved in 
0(ra 2 (lglgn) 3 /lg 2 n) time on a real RAM. 

We remark that by the reduction in Theorem [THl any nontrivial lower bound for (min, — ) 
convolution would imply a lower bound for (min, — ) matrix multiplication and the all-pairs shortest 
path problem. 

5 i\ Necklace Alignment and (median, +) Convolution 

5.1 Reducing t\ Necklace Alignment to (median, +) Convolution 

First we show the relation between l\ necklace alignment and (median, +) convolution. We need 
the following basic fact [2]: 

n-l 

Fact 16. For any vector z = (zq, zi, . . . , z n -\), \zi + c\ is minimized when c = — median™^ 1 Z\. 

i=0 

Instead of using (median, +) convolution directly, we use the equivalent form, (median,—) 
convolution: 

Theorem 17. The l\ necklace alignment problem can be reduced in 0{n) time to one (median, — ) 
convolution. 

Proof. For two necklaces x and y, we apply the (median, — ) convolution to the following vectors, 
as in the proof of Theorem [6j 

x = (xo,x ,Xi,xi, . . . , x n -i,x n -i; oo, -oo, oo, -oo, . . . , oo, -oo), 



y 1 = (yn-i,yn~i,y n -2,y n -2, ■ ■ ■ ,yo,yo;yn-i,yn-i,y n -2,yn-2, ■ ■ • , 2/0,2/0) • 

x' * if is 

med 

2(n+s')+l n-l 

median (x- - y^n+s'l+i-i) = median(xj - yu- s >-i) modn), 

which is median"^ 1 (xi — y(i+ s ) mo d n.) if we let s' = n — 1 — s. Applying Fact [161 we can therefore 
minimize median^Tg 1 |x» — y(i+ s ) mo d n + c l over c > f° r each s E {0, 1, . . . , n — 1}. By brute force, we 
can minimize over s as well using O(n) additional comparisons and time. □ 

Our results for (median, — ) convolution use the following result of Frederickson and Johnson: 

Theorem 18. [26] The median element of the union of k sorted lists, each of length n, can be 
computed in 0{k\gn) time and comparisons. 

5.2 (median, — ) Convolution in Nonuniform Linear Decision Tree 

We begin with our results for the nonuniform linear decision tree model: 

Theorem 19. The (median, — ) convolution of two vectors of length n can be computed in Oin^Jn lgn) 
time in the nonuniform linear decision tree model. 
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Proof. As in the proof of Theorem [9l we sort the set D = {xi — Xj,yi — yj : \i — j\ < d} of 
pairwise differences between nearby x^s and nearby y^s, where d < n is a value to be determined 
later. By Theorem [7J this step requires 0(nd + n\gn) comparisons between differences. These 
comparisons enable us to compare xi — yk-i versus Xj — yt-j for free, provided \i — j\ < d, because 
Xi — Vk-i < Xj — Vk-j precisely if X{ — Xj < yk-i — Vk-j- I n particular, we can sort each list 

L fc (A) = (xi - y k -i i = A, A + 1, . . . , min{A + d, n} - 1 

for free. By Theorem [T8l we can compute the median of Lk(0)ULk(d)L)Lk(2d)U- ■ -ULk(\k/ 'd~\d), i.e., 
median^ =0 (xj — yk-i), in 0((k/d) lg d) = 0((n/d)lgd) comparisons. Also, in the same asymptotic 
number of comparisons, we can binary search to find where the median fits in each of the Lk{\) 
lists, and therefore which differences are smaller and which differences are larger than the median. 

This median is the kth. entry of x * y. Therefore, we can compute all n entries of x * y in 

med mod 

0(nd + nlgn + (n 2 /d) Igd) comparisons. This asymptotic running time is minimized when nd = 
®{{n 2 /d) lg d), i.e., when d 2 /lgd = 6(n). Substituting d = ^nlgn, we obtain a running time of 
0{n\/n lgn) in the nonuniform linear decision tree model. □ 

Combining Theorems 1171 and 1191 we obtain the following result: 



Corollary 20. The l\ necklace alignment 'problem can be solved in 0{n^Jn lgn) time in the nonuni- 
form linear decision tree model. 

5.3 (min, — ) Convolution in Real RAM via Geometric Dominance 

Now we turn to the analogous results for the real RAM: 

Theorem 21. The (median, — ) convolution of two vectors of length n can be computed in 0(n 2 (lg lgn) 2 / lg n) 
time on a real RAM. 

Proof. Let x and y denote the two vectors of length n, and let x * y denote their (median, — ) 

med 

convolution. For each permutation ir on the set {0, 1, . . . , d — 1}, for each i 6 {0, d, 2d, . . . ,[n/d\d}, 
and for each j £ {0, 1, . . . , n — 1}, we define the (d — l)-dimensional points 

Pir,i = ( x i+n(0) ~ x i+n(l)i x i+n(l) ~ x i+n(2) > ■■■> x i+n(d-2) ~ x i+n(d-l) ) ; 
Qw,j = (yj-ir(0) - Z/j'-7r(l)> J/j'-7r(l) ~ 2/j-tt(2)) •••> ~ 2/j-7r(d-l))> 

(To handle boundary cases, define Xi = oo and yj = — oo for indices i,j outside [0, n — 1].) For each 
permutation tt, we apply Lemma [T0l to the set of red points {p nt i : i = 0, d, 2d, ... , [n/djd} and the 
set of blue points {q-K,j '■ j = 0, 1, . . . , n — 1}, to obtain all dominating pairs (p^i, (jVj)- 

Point p nji dominates q w j precisely if x i+ ^ 5) - x i+7r(5+1) > yj-^s) ~ Vj-^s+i) for all 5 G 
{0,1,... ,d — 2} (ignoring the indices outside [0,n — 1]). By re-arranging terms, this condition 
is equivalent to x i+n{S) - y^S) > x i+ir(6+l) ~ Vj~^&+i) for a11 5 G {0, 1, ... ,d - 2}, i.e., vr is a 
sorting permutation of (xi — y,j,Xi + \ — yj-i, ■ ■ ■ , Xj+d-i — yj-d+i). If we substitute j = k — i, we 
obtain that (pn,i,QiT.k-i) is a dominating pair precisely if tt is a sorting permutation of the list 
L k (i) = (xi - y k -i, x i+ i - yk-i+i, ^min{i+d,n}-i - ymm{k-i+d,n}-i ) ■ Thus, the set of dominating 
pairs gives us the sorted order of Lfc(i) for each i divisible by d and for each k. Also, there can 
be at most 0(n 2 /d) total dominating pairs (p^i, q-Kj) over all i,j,~K, because there are 0{n/d) 
choices for i and O(n) choices for j, and if (p n> i, q n j) is a dominating pair, then (jv^, q-w'j) can- 
not be a dominating pair for any permutation tt' ^ tt. (Here we assume that the sorted order 
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is unique, which can be arranged by standard perturbation techniques or by breaking ties consis- 
tently [9].) Hence, the running time of the d\ executions of Lemma flOl is d\2°^n l+£ + 0(n 2 /d) 
time, which is 0(n 2 lglgn/lgn) if we choose d = algn/lglgra for a sufficiently small constant 
a > 0. By Theorem HH we can compute the median of Lfc(0) U L^(d) U Lk(2d) U • • • U Lk{\k/d~\d), 
i.e., medianf =0 (xj — yk-i), in 0((k/d) lg d) = 0((n/d)lgd) comparisons. Also, in the same asymp- 
totic number of comparisons, we can binary search to find where the median fits in each of the 
Lfc(A) lists, and therefore which differences are smaller and which differences are larger than the 

median. This median is the kth entry of x * y. Therefore all n entries can be computed in 

med 

0{n 2 (\gd)/d) = 0(n 2 (lglgn) 2 /lgra) time on a real RAM. □ 

Combining Theorems 1171 and 1211 we obtain the following result: 

Corollary 22. The t\ necklace alignment 'problem can be solved in 
0(ra 2 (lglgn) 2 /lgn) time on a real RAM. 

As before, this approach likely cannot be improved beyond 0(n 2 / lgra), because such an im- 
provement would require an improvement to Lemma \W\ which would in turn improve the fastest 
known algorithm for all-pairs shortest paths in dense graphs |10| . 

In contrast to (median, +) convolution, (mean, +) convolution is trivial to compute in linear 
time by inverting the two summations. 

6 Conclusion 

The convolution problems we consider here have connections to many classic problems, and it would 
be interesting to explore whether the structural information extracted by our algorithms could be 
used to devise faster algorithms for these classic problems. For example, does the antidiagonal 
information of the X + Y matrix lead to a o(n 2 lg n)-time algorithm for sorting X + Y? We believe 
that any further improvements to our convolution algorithms would require progress and/or have 
interesting implications on all-pairs shortest paths [9]. 

Our (min, — )-convolution algorithms give subquadratic algorithms for polyhedral 3SUM: given 
three lists, A = (a ,ai, . . . , ct n _i), B = (b ,bi, . . . ,6„,_i), and C = (c ,ci, . . . ,c 2n -2), such that 
«i + bj < Q+j for all < i, j < n, decide whether + bj = Ci+j for any < i,j < n. This problem 
is a special case of 3SUM, and this special case has an Q(n 2 ) lower bound in the 3-linear decision 
tree model [23]. Our results solve polyhedral 3SUM in 0(?T, 2 /lgn) time in the 4-linear decision 
tree model, and in 0{riy/n) time in the nonuniform 4-linear decision tree model, solving an open 
problem of Erickson [21]. Can these algorithms be extended to solve 3SUM in subquadratic time 
in the (nonuniform) decision tree model? 
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